Introduction: Multiple myeloma (MM) is a common, incurable blood malignancy. It is consistently preceded by a precursor condition monoclonal gammopathy of undetermined significance (MGUS). Patients diagnosed with MM often present with multimorbidity, particularly type 2 diabetes mellitus (T2DM). As such, it is critical to identify risk factors for the progression of MGUS to MM in patients with both T2DM and MGUS. Prior studies have identified several risk/protective factors for MM, such as obesity (Chang, JNCI 2016), monoclonal (M)-protein level (Chang, CEBP 2019), and metformin use (Chang, Lancet Haematol 2015). However, these factors were analyzed as static variables observed at a certain timepoint (e.g., baseline) without utilizing their time-varying features that can capture the dynamics and cumulative exposure. As these data were documented in patients' electronic health records, we used advanced machine learning techniques to identify dynamic markers that predict the progression to MM in patients with T2DM and MGUS.

Methods: Data from the nationwide Veteran Health Administration were used. Published natural language processing models (Wang, JCO CCI 2023) were used to identify individuals diagnosed with MGUS from 1999-2024 and to confirm their MM diagnoses. Patients with a T2DM diagnosis, identified via ICD codes and confirmed by treatment, prior to MGUS were included. A machine learning model was developed to predict the progression of MGUS to MM. We considered the following time-varying markers: body mass index (BMI), levels of M-protein, creatinine, hemoglobin A1C, potassium, sodium, calcium, glucose, and total protein, as well as T2DM medication - metformin, sulfonylureas, insulin, sodium-glucose transport protein 2 inhibitors, glucagon-like peptide-1 receptor agonists, dipeptidyl peptidase-4 (DPP-4) inhibitors, and thiazolidinediones. The observation window was defined as the period from MGUS diagnosis to the progression to MM or censoring. For each marker, we then generated ad-hoc features by taking its mean, variance, maximum, minimum, max change/month, min change/month, and area under the curve for changes (AUTCC). The AUTCC integrates the change in the feature into one metric. Mean dosage per year was calculated for each T2DM medication. Static features including sex, race, agent orange exposure, ages at MGUS and T2DM diagnosis, MGUS subtype, free light chain (FLC) ratio at MGUS diagnosis, and year of MGUS diagnosis were also included in the model. The cohort was randomly split into training (70%) and testing (30%) sets. To overcome the imbalanced ratio of MM and non-MM (~1:9), we implemented Synthetic Minority Oversampling Technique (SMOTE) in the training set. Significant predictors were identified through Hilbert-Schmidt Independence Criterion (HSIC) Lasso regression to recursively compute the importance of each predictor using the training set. 15 out of 78 features were selected to develop machine learning models with the training set. Random forest (RF) classifier was selected for its robustness to feature collinearity. 10-fold cross validation was used to evaluate the model performance in the testing set. To quantify each feature's contribution towards the prediction of MM, the mean of absolute SHapley Additive exPlanations (SHAP) values across the samples were computed. For each sample, a positive SHAP value indicates a positive impact on the risk of progression to MM; and vice versa with a higher absolute value indicating the scale of the impact.

Results: We included 21,755 MGUS patients with T2DM. The RF model achieved mean precision=0.95, recall=0.78, and F1-score=0.86 for predicting MM in the testing set. The top dynamic markers contributing to the progression ranked as follows: AUTCC and min change/month of M-protein (mean|SHAP|=0.029 and 0.017, respectively), max potassium change/month (0.016), min BMI change/month min (0.016), BMI AUTCC (0.015), min creatinine (0.012), and mean DPP-4 dosage per year (0.011).

Conclusions: Machine learning, e.g., RF models can incorporate dynamic markers in risk prediction models. For patients with T2DM and MGUS, the identified top dynamic markers and T2DM medication can predict progression of MGUS to MM. These findings, if further validated in other populations and databases, can inform potential interventions to deter progression of MGUS to MM and manage DM in vulnerable patient populations with MGUS and T2DM.

This content is only available as a PDF.
Sign in via your Institution